Integrative analysis identifies gene signatures mediating the effect of DNA methylation on asthma severity and lung function

DNA methylation (DNAm) changes play a key role in regulating gene expression in asthma. To investigate the role of epigenetics and transcriptomics change in asthma, we used publicly available DNAm (asthmatics, n = 96 and controls, n = 46) and gene expression (asthmatics, n = 79 and controls, n = 39) data derived from bronchial epithelial cells (BECs). We performed differential methylation/expression and weighted co-methylation/co-expression network analyses to identify co-methylated and co-expressed modules associated with asthma severity and lung function. For subjects with both DNAm and gene expression data (asthmatics, n = 79 and controls, n = 39), machine-learning technique was used to prioritize CpGs and differentially expressed genes (DEGs) for asthma risk prediction, and mediation analysis was used to uncover DEGs that mediate the effect of DNAm on asthma severity and lung function in BECs. Finally, we validated CpGs and their associated DEGs and the asthma risk prediction model in airway epithelial cells (AECs) dataset. The asthma risk prediction model based on 18 CpGs and 28 DEGs showed high accuracy in both the discovery BEC dataset with area under the receiver operating characteristic curve (AUC) = 0.99 and the validation AEC dataset (AUC = 0.82). Genes in the three co-methylated and six co-expressed modules were enriched in multiple pathways including WNT/beta-catenin signaling and notch signaling. Moreover, we identified 35 CpGs correlated with DEGs in BECs, of which 17 CpGs including cg01975495 (SERPINE1), cg10528482 (SLC9A3), cg25477769 (HNF1A) and cg26639146 (CD9), cg17945560 (TINAGL1) and cg10290200 (FLNC) were replicated in AECs. These DEGs mediate the association between DNAm and asthma severity and lung function. Overall, our study investigated the role of DNAm and gene expression change in asthma and provided an insight into the mechanisms underlying the effects of DNA methylation on asthma, asthma severity and lung function. Supplementary Information The online version contains supplementary material available at 10.1186/s13148-023-01611-9.


Introduction
Recent advances in high-throughput multi-omics approaches have enabled the collection of molecular assessments at different layers, providing complementary perspectives of complex diseases including asthma.While having a family history is a known risk factor for developing asthma [1], the low concordance rate of asthma between monozygotic twins suggests that epigenomics and transcriptomics profiling and/or environmental factors may play substantial role in asthma pathogenesis [2].
DNA methylation (DNAm) may serve as surrogate endpoint for environmental exposures and could serve as a potential biomarker in the association between environmental factors and asthma endotypes [3,4].Reduced lung function is a hallmark of asthma disease.Lung function measures such as forced expiratory volume in one second (FEV1) and forced vital capacity (FVC) are strongly associated with decline in airflow in asthma [5][6][7].Many genomics studies of asthma have been performed either on gene expression data or DNAm data alone.For example, there were studies that showed the association of DNAm with lung function and asthma [8,9], as well as a study that showed global changes of DNAm affect gene expression derived from BECs in asthma [10].Furthermore, Zhang et al. revealed differentially expressed genes as well as enrichment of neutrophil activation and cytokine receptor pathways for severe asthma [11].Thürmann et al. showed that DNAm plays key roles in various biological processes such as regulating gene expression in childhood asthma [12].Perry et al. identified differentially methylated DNAm CpG sites that might play a role in regulating several key asthma-related genes such as DBX2, ACP6 and KCNJ11 [13].However, the results of their study were primarily using either DNAm or gene expression alone, and the role of DNAm on gene expression associated with asthma status, severity and lung function was not thoroughly investigated through integrated analysis of DNAm and gene expression.Epigenetic and transcriptomic profiling are well suited to identify novel genes and pathways involved in asthma pathogenesis [14,15].
In this study, we address the limitations of the previous asthma studies with the aim of identifying novel epigenetic regulation of gene expression associated with asthma severity and lung function through integrated analysis of DNAm and gene expression.The aims of the integrated analysis were to: (1) identify differentially comethylated or co-expressed network modules associated with asthma status, severity and lung function; (2) identify dysregulated biological function within these identified modules and unveil shared pathways underlying asthma severity and lung function; (3) identify significant CpGs and DEGs using machine learning and construct multi-omics asthma risk predictive models; and (4) infer the mediational roles of gene expression on the relationships between DNA methylation changes and asthma severity as well as lung function in BECs.A subset of CpGs-DEGs pairs showed significant association in asthma and were used to validate the prediction model in AECs dataset.Overall, our results provide an insight into the role of DNAm in asthma risk, severity and identify important genes that may mediate the effect of DNAm changes on asthma severity and lung function.

Construction of co-methylation networks and identification of modules associated with asthma severity and lung function
Initially, we conducted differential methylation analysis between asthmatic subjects (n = 96) and controls (n = 46) based on DNAm data (M-values) adjusting for covariates including age, sex, smoking status and the first three ancestry PCs in the discovery DNA methylation dataset.The differential analysis identified 1845 differentially methylated CpGs (DMCs, including 1500 hyper-methylated and 345 hypo-methylated) in asthmatic subjects compared to controls (adjusted P value < 0.05 and Δ meth > 1%; Fig. 2A).We reported the methylation difference (effect size, Δ meth ) between asthmatic subjects and controls as a percentage (e.g., effect size of 0.01 = 1%).The 1845 DMCs whose effect size ranged from − 14.1 to 21.0% were further analyzed to identify co-methylation networks/modules using WGCNA [16] package and the identified modules were tested for association with asthma severity, lung function and asthma-relevant clinical measures.In co-methylation network analysis, the suitable soft threshold power (β) = 7 was used as the correlation coefficient threshold to ensure relatively balanced mean connectivity and scale free network (Additional file 1: Fig. S1A).WGCNA analysis clustered the 1845 DMCs into four modules with 95-1382 methylated CpGs per module (i.e., the largest is turquoise with 1382 CpGs, 723 genes) (Fig. 2B and Additional file 2: Table S1).
The gray module contained uncorrelated CpG sites and hence was not considered in downstream analyses.Notably, module eigenCpGs from three co-methylation modules including blue, brown, and turquoise modules were significantly associated with asthma relevant clinical measures including asthma severity, and lung function measures (Fig. 2C; P value < 0.05).Turquoise and brown modules were found to be positively associated with asthma severity, while blue module was negatively associated with asthma severity.Age was associated with blue and turquoise modules.Clinical phenotypes such as IgE, FEV1, FEV1/FVC and BAL eosinophilia were significantly associated with all three modules.BMI was associated with blue, brown, and turquoise modules, whereas blood eosinophilia was significantly associated with brown module.

Construction of co-expression networks and identification of modules associated with asthma severity and lung function
For co-expression network analysis, we also first performed differential analysis between asthmatic subjects (n = 79) and controls (n = 39) using normalized Fig. 1 The overall workflow of the study.Initially, preprocessed, and normalized gene expression and DNA methylation data were downloaded and analyzed for differential methylation and expression analyses and weighted correlation network analysis (WGCNA) to generate DEG and DMC modules associated with asthma severity and lung function.Then, both DEGs and DMCs (adjusted P value < 0.05) in modules significantly associated with asthma severity and lung function (P value < 0.05) were selected.Finally, key DMCs and DEGs were selected to develop asthma-risk prediction models (methylation-based risk score (MRS) and transcriptomic-based risk score (TRS).Mediation analysis was conducted to select DMCs regulated DEGs in BECs RNA-seq data (adjusted for covariates including age, gender, smoking status and the first three ancestry PCs).The differential analysis identified a total of 1891 DEGs including 1058 downregulated and 833 upregulated genes in asthmatic subjects compared with controls (absolute value log2 (Fold change) > 0.1 and adjusted P value < 0.05) (Fig. 3A).Next, we performed WGCNA analysis to cluster and characterize correlation structure of the 1891 DEGs.WGCNA analysis using suitable β = 10 (Additional file 1: Fig. S1B) was conducted, and a total of seven co-expression modules with 32-785 genes per module (Fig. 3B) were identified excluding the gray module.The module eigengenes of the rest seven co-expression modules were significantly associated asthma severity and asthma relevant clinical features including lung function measures (Fig. 3C; P value < 0.05).The modules turquoise, brown, and green were negatively associated with asthma severity.The modules red, black, blue, and yellow were positively associated with asthma severity.Among clinical phenotypes, IgE was significantly associated with yellow and black modules.BMI was associated with red, brown, and turquoise modules.FEV1/FVC and BAL eosinophilia were significantly correlated with all modules.FEV1 and blood eosinophilia were correlated with all modules except red module.

Functional enrichment analysis of co-methylated and co-expressed modules Pathway analysis of co-methylated modules
Methylation CpGs in each module were mapped to genes based on IlluminaHumanMethylation450kanno.ilmn12.hg19 in the minfi package, and these co-methylated module genes were used for IPA analysis.Significantly enriched IPA results of three asthma severity and lung function associated co-methylation modules (turquoise, brown and blue) are shown in Fig. 4A-C and Additional file 2: Table S2.The 723 unique genes in turquoise co-methylation module were enriched in several pathways including pulmonary healing signaling, WNT/beta-catenin signaling, HOTAIR regulatory, regulation of epithelial development, and axonal guidance signaling pathways (Fig. 4A).The 70 genes associated with the brown co-methylation module were significantly enriched for pathways such as notch signaling, role of macrophages, fibroblasts and endothelial cells in rheumatoid arthritis, epithelial adherens junction signaling and WNT/beta-catenin signaling (Fig. 4B).The 205 mapped genes associated with blue co-methylated module were enriched in G-protein coupled receptor signaling and chronic myeloid leukemia signaling (Fig. 4C).

Overrepresentation of epigenetic regulated genes in the co-expression modules
We conducted hypergeometric tests on overlapping genes between the co-expression modules and the comethylation modules that are associated with asthma severity and lung function to identify epigenetic regulated genes that overrepresented in the co-expression modules.The analysis showed that genes in three comethylated modules of interest were overrepresented in the six co-expressed modules (turquoise, brown, green, red, blue, and yellow) (Additional file 2: Tables S1 and  S3).

Pathway analysis of co-expressed modules
The results of significantly enriched pathways for genes in the co-expression modules of interest are shown in Fig. 5A-D and Additional file 2: Table S4.The 785 unique genes associated with turquoise module were significantly enriched in several biological pathways such as RHO GTPase cycle, generic transcription pathway and WNT/beta-catenin signaling (Fig. 5A).The 61 unique genes associated with red module were significantly enriched in several biological pathways such as pathogen induced cytokine storm signaling, activin inhibin signaling and STAT3 Pathway (Fig. 5B).The 103 genes associated with the green module were mainly involved in protein kinase signaling and notch signaling (Fig. 5C).The 140 genes associated with the brown module were involved in axonal guidance signaling (Additional file 2: Table S4).Notably, uniquely annotated genes shared between the co-methylation modules and genes co-expression modules were enriched in several pathways including axonal guidance signaling, notch signaling, WNT/beta-catenin signaling, pulmonary fibrosis idiopathic signaling and myelination signaling (Additional file 1: Fig. S2), suggesting DNA methylation changes in these pathways may potentially regulate genes in asthma pathogenesis.

Construction and validation of asthma multi-omics risk model
To construct a multi-omics asthma prediction model in the discovery BECs datasets, we used Boruta method to identify important CpGs and genes whose DNAm or expression that showed high asthma-risk prediction performance.The results showed that 18 DMCs and 28 DEGs yielded the "confirm" status in Boruta iterations in methylation and expression data, respectively, in the discovery BECs datasets (Additional file 2: Tables S5  and S6) with high asthma risk prediction performance.Next, based on 18 important CpGs and 28 genes, we constructed Methylation risk score (MRS) and transcriptome risk score (TRS), respectively, and their asthma-risk prediction performances were assessed.The asthma-risk prediction MRS model had an AUC of 0.92, and the TRS model had an AUC of 0.98 in the discovery datasets.We also constructed multi-omics risk model by integrating MRS and TRS models based on 118 samples having both DNA methylation and gene expression data and this model revealed high asthma diagnostic performance with AUC = 0.99 (Fig. 6A).The MRS is significantly related to asthma severity (Additional file 1: Fig. S3A), FEV1 (r = − 0.43, P value = < 0.001; Additional file 1: Fig. S3B) and FEV1/ FVC (r = − 0.5, P value = < 0.001; Additional file 1: Fig. S3C).The TRS also associated with asthma severity (Additional file 1: Fig. S3D), FEV1 (r = − 0.23, P value < 0.05; Additional file 1: Fig. S3E) and FEV1 (r = − 0.33, P value < 0.001; Additional file 1: Fig. S3F).Notably, the risk scores significantly associated with several asthma clinical features such as BAL eosinophilia, BAL neutrophils, IGE, and blood eosinophilia (Additional file 1: Figs.S4 and S5).To validate the robustness of our developed asthma multi-omics risk model, we constructed 18 CpG sites-based MRS model, and 28 genesbased TRS model using 81 samples having both DNA methylation and gene expression data in validation AECs dataset.The diagnostic ability of multi-omics asthma risk

Integrating DNA methylation and gene expression data
We performed correlation analysis between methylation levels of DMCs and the expression level of DEGs in the modules associated with asthma severity and lung function to explore the biological relevance of the DMCs in gene expression.Initially, the 1845 DMCs were mapped to 1035 genes.Then, correlation analysis between the methylation levels of DMCs and gene expression levels identified a total of 35 significantly correlated DMC-DEG pairs (Fig. 7A-F and Additional file 2: Table S7) including cg14015211 (TLR5), cg21385480 (LRIG1), cg01975495 (SERPINE1), cg10528482 (SLC9A3), cg25477769 (HNF1A) and cg26639146 (CD9).Using these results, we conducted a mediation analysis for each of the 35 DMC-DEG pairs to examine direct effect (CpG methylation → asthma severity or FEV1) and mediational effect (CpG methylation → gene expression → asthma severity or FEV1) pathways adjusting for age, gender, and ancestry PCs.We identified several significant mediation genes (Table 2 and Additional file 2: Table S8) including INAGL1, SERPINE1, TLR5, SLC9A3 and CD9 for asthma severity and INAGL1, TLR5, CD9 for FEV1 (Table 2 and Additional file 2: Table S9), suggesting potential regulation effect of DNAm on gene expression in asthma.Moreover, we found a significant direct effect of DNAm CpG on asthma severity and FEV1 after adjusting for age, gender, ancestry, and mediator gene (Table 2, Additional file 2: Tables S7 and S8).

Validation of differential methylation CpG sites associated with DEGs in an independent AEC dataset
To validate the 35 DMC-DEG pairs associated with asthma severity and lung function identified in BECs, we further used an independent AECs validation dataset.First, we conducted differential methylation analysis between asthmatic subjects (n = 74) and Controls (n = 41) in AECs adjusting age and gender and showed that 29 CpG sites out of 35 DMCs identified in BECs were found to be differentially methylated in AECs (Table 3) with the same directional effect (adjusted P value < 0.05 and Δ meth > 1%).Next, we examined correlation between 29 DMCs and their mapped DEGs based on 81 samples having both DNA methylation and gene expression data in validation AECs dataset.We validated 17 DMCs significantly correlated with DEGs (Fig. 8A-F and Additional file 1: Fig. S6) including cg01975495 (SER-PINE1), cg10528482 (SLC9A3), cg25477769 (HNF1A)  and C. green co-expression modules that are significantly (P value < 0.01) associated with asthma severity and/or lung function and cg26639146 (CD9), cg17945560 (TINAGL1) and cg10290200 (FLNC).More importantly, several of these identified DEGs that are correlated with DMCs have been implicated in asthma [17][18][19].

Discussion
In this study, DNAm and gene expression change in BECs of asthmatics and controls were analyzed at a genomewide scale.Based on differential methylation, expression, and weighted correlation network analysis, we identified co-methylation modules and co-expressed modules that The functional enrichment analysis of co-methylation module genes and co-expression module genes showed enrichment in several shared pathways including axonal guidance signaling, notch signaling, WNT/beta-catenin signaling, pulmonary fibrosis idiopathic signaling and myelination signaling.Subsequently, machine learning methods identified the most informative subset of DMCs and DEGs that leads to better asthma diagnostic performance.Moreover, correlation analysis followed by mediation analysis identified potential DNAm regulated DEGs that were associated with asthma severity and lung function and the findings were validated using an independent AECs dataset.
In this study, we first identified DMCs and DEGs in asthmatic samples compared to controls and further conducted weighted correlation network analysis to identify network modules associated with asthma severity and lung function and identified several shared pathways including WNT/beta-catenin and notch signaling.The Wnt/beta-catenin signaling pathway regulates airway remodeling, which is involved in chronic asthma [20].Indeed, several genes identified in this study are enriched in WNT/beta-catenin signaling, including FZD7 and HNF1A, suggesting methylation signals associated with the activation WNT/beta-catenin signaling in the development of asthma pathology.Moreover, differentially methylated CpGs annotated genes and differentially expressed genes (DTX1, MAML2, and NOTCH1) are involved in notch signaling and this result is consistent with previous finding, showing DNAm associated with notch signaling pathway activation in severe asthmatic subjects [13].The notch signaling regulates the immune response and a therapeutic target for asthma [21].
To evaluate predictive performance of DMCs and DEGs, we used machine learning methods to identify important DMCs and DEGs and constructed asthma risk prediction models.Most previous studies that constructed asthma risk prediction model were limited to either using CpGs or DEGs [22], and there were limited studies developed risk model based on integrated transcriptomic and DNAm data.A recent study by Liu et al. [23] used co-expression and co-methylation network analyses to identify genetic signatures and methylation pathways associated with Opioid use disorder.Zhang et al. [11] also showed that the roles of co-expression network analysis to identify hub DEGs associated with asthma severity.In this study, we constructed multiomics-based risk models in predicting asthma, as multiomics data potentially predict outcome more accurately  [25].In addition, Zhou et al. suggested that DNA methylation changes of SERPINE1 might be responsible for mediating the effect of genetic variate on the development of food allergy [17].Importantly, our study revealed that SERPINE1 was hypo-methylated in BECs and AECs and its expression mediates the methylation effect on asthma severity, and lung function.The validated co-methylated CpG site annotated to SLC9A3 is part of our asthma-risk predictive model.This gene is related to airway epithelium and immune regulation.Notably, our findings showed that expression level of SLC9A3 was associated with DNA methylation changes and mediated methylation effect on asthma severity and lung function association.Our finding aligned with previous genome-wide study in atopy [18], indicating expression changes of SLC9A3 mediate the methylation and atopy association.Other hyper-methylated validated genes were mapped to genes such as CD9, SCGN, KLF4, KCNK3 and HNF1A.Previous study reported deficiency of KLF4 compromises the lung function in an acute mouse model of allergic asthma [26].Besides, KLF4 regulates immune response-related genes including IL1RL1, CD274, and CD44 in human nasal epithelial cells of allergic rhinitis [27].In line with previous study, our study also found that KLF4 is significantly associated with asthma severity and lung function.
In addition, we found hypo-methylated KLF4 significantly associated with asthma severity and lung function, which suggests that methylation effect may regulate KLF4 in asthma and lung function disease development.
We also found that hyper-methylated CpG annotated to HNF1A was associated with asthma severity and reduced lung function.Our finding was supported by the previous study showing hyper-methylated HNF1A linked with asthma pathogenesis [28].
Our study has some limitations.Sample size of the DNA methylation and transcription expression data is relatively small.Hence, this study may not detect all association between DNA methylation CpG and gene expression changes and with clinical traits such as asthma status, severity and lung function measures.Since CpG can be influenced by lifestyle factors and environmental risk factors, in future, considering lifestyle and environmental risk factors and their association with methylation changes in asthma disease is recommended.The study was limited by datasets deposited in public domains.Our analysis is also limited to DNAm and DEGs, and future studies including genomics, proteomics and metabolomics need to be conducted.
Our study also has several strengths.First, we constructed co-methylation and co-expression modules associated with asthma severity and lung function measures and found that differentially co-methylation module CpGs and co-expression module DEGs enriched common biological pathways.Second, we developed MRS and TRS in asthma-specific tissue and further validated using asthma-relevant tissues [29].Third, to our knowledge, this is the first study that constructed multiomics asthma risk prediction model by integrating TRS and MRS in BECs and conducted mediational analysis to assess whether the methylation-asthma severity or lung function association is explained by transcriptomics changes.
In conclusion, DNAm is an epigenetic mechanism that plays a key role in regulating gene expression.An integrated analysis of DNAm and gene expression data identified several DNA methylation-regulated genes associated with asthma severity and lung function in BECs dataset and subset of which including cg01975495 (SERPINE1), cg10528482 (SLC9A3), cg25477769 (HNF1A) and cg26639146 (CD9), cg17945560 (TIN-AGL1) and cg10290200 (FLNC) were further confirmed in validation AECs dataset.The validation results showed that these methylation CpG sites/genes may serve for asthma diagnosis and the development of new epigenetic therapies for asthma.In addition, the integration of multi-omics data has expanded our understanding of how various omics data correlated with one another and inform better prediction of disease risk.

Datasets and filtering criteria
The overall pipeline of our study is shown in Fig. 1.Eligible and publicly available DNAm and gene expression datasets were selected based on the following criteria.
(1) Homo sapiens, (2) sample size ≥ 80, and (3) consists of DNAm and gene expression profiles of asthmatic and control subjects.DNAm data (accession: GSE201872) and RNA-seq data (accession: GSE201955) obtained from BECs that satisfied the inclusion criteria were selected and used as discovery datasets.Other datasets with DNAm (accession: GSE85568) and RNA-seq data (accession: GSE85567) obtained from AECs that satisfied the inclusion criteria were selected and used for validation datasets.Initially, DNAm and gene expression profiles and corresponding clinical information were downloaded from the identified Gene Expression Omnibus (GEO) database.
The quality controls and data normalization procedures for DNAm data (GSE201872) and RNA-seq data (GSE201955) were already conducted and described in the original study [30].
For GSE201872, a matrix of DNAm data with M-values of a total of 61,162 methylation CpGs in 142 samples (including 46 controls, 47 mild-moderate asthmatic and 49 severe asthmatic) based on both Illumina Infinium MethylationEPIC and Infinium Human-Methylation450K assays were obtained.We further normalized the between platform differences using surrogate variable analysis (SVA) [31] package.The GSE201955-RNA-seq data contain a total of 118 samples, which are subset of the 142 samples in the DNAm GSE201872 dataset, including 39 control, 39 mildmoderate asthmatic and 40 severe asthmatics and a total of 13,757 genes with normalized RNA-seq expression profiles.Therefore, there are a total of 118 samples with both DNAm and gene expression data available between the GSE201872 and GSE201955 datasets.Three outcome variables: asthma status only (cases vs. controls) for identification of DMCs and DEGs and the development of risk prediction models, asthma severity (no, step = 0, mild-moderate, step = 1, 2, 3, 4 and severe, step = 5, 6) and lung function measures (including FEV1 and FEV1/FVC) for associations with co-methylation and co-expression modules, were considered in this study.Other demographic and clinical variables such as age, gender, race, current smoking status, maternal asthma, BMI, IgE, BAL eosinophilia, BAL neutrophilia, and blood eosinophilia were included in our study.In addition to clinical and demographic variables, we included the three ancestry principal components (PCs) information, which were estimated in the original study [10].To replicate and validate our findings, DNAm data from GSE85568 (contained 71 asthmatic samples and 42 control samples) and gene expression data from GSE85567 (contained 57 asthmatic samples and 28 control samples) derived from AECs were used.Between DNAm data from GSE85568 and gene expression data from GSE85567 datasets, a total of 81 same samples have both methylation and gene expression data available.The summary of the datasets used in this study is shown in Additional file 2: Table S10.

Identification of differentially methylated and expressed genes
We conducted differential methylated and expressed analysis between asthmatic and control samples to identify DMCs and DEGs using Empirical Bayes linear model in the limma package [32] adjusting for age, gender, current smoking status and ancestry information.Differential methylation analysis on M-values, the Benjamini and Hochberg (B-H) [33]-based adjusted P value < 0.05 and an absolute effect size (|Δ meth |) > 1% were used to define DMCs.The effect size was defined based on betavalues as follows: (Δ meth ) = mean methy asthmatics − mean methy Control .Similarly, differential expression analysis was conducted to identify DEGs adjusting age, gender, and smoking status in the discovery dataset.We used an absolute value of log2 (fold change = gene expression asthamtics /gene expression controls ) > 0.1 and a significance threshold of adjusted P value < 0.05 based on B-H procedure to identify DEGs [33].

Weighted correlation analysis
We conducted weighted correlation network analyses to identify co-methylation and co-expression networks based on the connectivity of methylation of CpGs and gene expression profiles using WGCNA package [34].Initially, similarity matrix of DMCs or DEGs were constructed using pairwise correlation S ij = cor x i , x j , where x i and x j represent the ith row and the jth row of DNAm/gene expression data matrix X, respectively.The similarity matrix was transformed into an adjacency matrix, represented by A ij = cor x i , x j β , where the suitable soft-thresholding power β was set ranging from 1 to 20 utilizing the pick Soft Threshold function.Dynamic tree cutting algorithm was used to divide DMCs or DEGs into different groups of connected CpGs (co-methylation modules) or connected DEGs (co-expressed modules) based WGCNA signed network, respectively.Methylated CpGs/DEGs with similar methylation/expression patterns are clustered into the same module and may potentially share common biological roles [35].The eigenvectors of the co-methylation and co-expression modules were derived and examined for associations with clinical features of asthma including the asthma severity outcome (no, mild-moderate or severe asthma), lung function measures (FEV1 and FEV1/FVC), IgE, BMI, BAL eosinophilia, blood eosinophilia, maternal asthma status, age, gender, race, current smoking status and ancestry PCs.Lastly, modules significantly correlated with asthma severity and lung function measures were selected for further analysis(P value < 0.05).The DMCs within significant co-methylated modules and DEGs within co-expressed modules are hypothesized to have an influential role in regulating diseases [36].

Pathway analysis of module genes
The network module level analysis between co-methylated and co-expressed module genes was conducted to integrate DNAm and gene expression analysis.At the network module level, we used a hypergeometric test in the GeneOverlap [37] package to measure overlapping genes between CpG mapped genes in the co-methylation modules associated with asthma severity and lung function and DEGs in the co-expression modules associated with asthma severity and lung function.A hypergeometric test with P value < 0.05 was used to assess the significance of overlapping genes between co-methylation modules and co-expression modules.Then, pathway enrichment analyses by Ingenuity Pathway Analysis (IPA) software [38] were used to understand the biological function of genes in the co-methylated and co-expression modules that were associated with asthma severity and lung function.The IPA method evaluates proportional representation of module genes from a defined set in a canonical pathway in all set of known genes.Canonical pathways related to input module genes were elucidated with a ratio to evaluate significant pathway enrichment adjusted for multiple testing.The adjusted P value was calculated based on B-H procedure and biological functions with adjusted P value < 0.05 were defined as significant canonical pathways.

Construction and validation of multi-omics risk score models
We constructed three asthma risk prediction models: methylation-based risk score (MRS), transcriptomicbased risk score (TRS) and multi-omics risk model (MRS + TRS).Differentially co-methylated CpG sites and differentially co-expressed genes obtained from differential and WGCNA analysis with adjusted P value < 0.05 were included in supervised Boruta algorithm.The Boruta algorithm is type of machine learning technique, which was used to select important CpG sites and genes whose DNAm levels and gene expression levels are relevant for asthma risk prediction [39] using Boruta R package.Important CpG sites and genes yielded the 'confirmed' status in Boruta iterations were selected for subsequent analysis.Next, multiple logistic regression model was utilized to generate coefficients of the CpGs and genes, separately for asthma risk prediction.Then, MRS = Σβi × methyl-CpGi, where methyl-CpG is the M value and β is the regression coefficient from logistic regression analysis [40] and TRS = Σβi × exp-genei, where exp-gene is the normalized gene expression value and β is the coefficient from logistic regression analysis were constructed.Finally, multi-omics risk score model by integrating TRS and MRS was constructed using 118 samples having both DNAm and gene expression data in the discovery BCEs dataset.The prediction performances were assessed in the discovery BEC datasets using area under the ROC curve (AU-ROC) as implemented in pROC [41] R package.The risk prediction performances were validated using 81 samples having both DNAm and gene expression data in the validation AECs dataset.

Correlation analysis between CpG methylation level and expression level of DEGs
To evaluate whether identified differentially methylated CpGs are correlated with corresponding mapped DEGs, we performed correlation analysis using Pearson correlation coefficient ( R i ) as calculated below based on M-values of methylated CpGs and normalized expression values of DEGs [42] of 118 subjects in discovery BECs dataset.
Here, p ij represents the M-values of i-th CpG site the j-th subject, p i is the M-value of i-the CpG site over subjects; g ij represent the expression value of i-th gene in the j-th subject, g i is the mean expression value of i-the gene over subjects and R i is correlation coefficient between i-th CpG site M-values and i-th gene expression values, where the i-th CpG site is mapped to the i-th gene.Significant correlations between the differentially methylated CpGs and mapped DEGs in discovery BECs dataset were validated based on 81 samples having both DNA methylation and gene expression data in the independent validation AECs dataset.

Mediation analysis
After evaluating the correlation between methylated CpGs and the corresponding mapped DEGs in the discovery dataset, we conducted mediation analysis to explore whether any proportion of the association between methylated CpGs and asthma severity/FEV1 is mediated by DEG [43][44][45] among the subjects with both DNAm and gene expression data.In our mediation analysis, we constructed three models: including (1) mediator model (a-path) to evaluate the association between the gene expression and DNAm of CpG using linear regression model.(2) Outcome models (b-path and c′-path) to evaluate the effect of DNAm and gene expression on asthma severity using ordinal logistic regression or FEV1 using linear regression.(3) Total effect model (c-path) to evaluate the effect of DNAm on the asthma severity using ordinal logistic regression or FEV1 using linear regression.All models were adjusted for covariates: age, gender, and ancestry.The regression models used in our study are described in Fig. 1 and as follows.
The mediation effect of each CpGs was performed separately.The standardized coefficients were estimated based on 1000 nonparametric bootstrap resampling quasi-Bayesian approximation method using mediation package [46].The total effect from DNAm to asthma severity/FEV1 was decomposed into the indirect effect or mediated effect (ME) through gene expression (methylation CpGs → gene expression → asthma severity or FEV1) and the direct effect (methylation CpGs → asthma severity or FEV1) not mediated by gene expression but could be mediated by other factors or a direct link between DNA methylation changes and asthma severity or FEV1.The ME is equal to a × b, which is equivalent to c-c′ and denotes the part of the DNAm effect of each CpG on asthma severity or FEV1 mediated by gene expression.Significance of association was defined based a P value < 0.05 as cut-off.

Statistical analysis
All preprocessing and statistical analysis were performed using R software version 4.2.2 and associated Bioconductor packages.Comparisons of median among three groups were tested using pairwise median test.The ANOVA test was used to compare means of three groups using multcomp package.The Tukey Honest significant differences test was utilized to compare mean between two groups.Associations between two continuous variables were assessed by either Spearman or Pearson correlation coefficient.Statistical significance based on B-H adjusted P value < 0.05 was used, unless otherwise stated.

Fig. 2
Fig. 2 Identification of CpGs associated with asthma severity in the discovery BECs dataset.A Volcano plot showing CpGs associated with asthma.DMCs are identified as those with adjusted P value < 0.05 and absolute effect size (Δ meth ) > 1%.B Cluster dendrogram of four modules (blue, brown, turquoise, and gray) containing 1891 DMCs.C Correlation between co-methylation modules and various clinical measures

Fig. 3
Fig. 3 Identification of DEGs associated with asthma severity in the discovery BECs dataset.A Volcano plot showing DEGs related to asthma.Differentially expressed genes (DEGs) are those with adjusted P value < 0.05 and absolute value of fold change > 0.1.B Cluster dendrogram of 1891 DEGs revealed eight modules including turquoise, brown, green, blue, yellow, black, red, and gray.C Correlation between co-expression modules and various clinical measures

Fig. 4
Fig. 4 Pathways significantly enriched by genes in the A. turquoise, B. brown, and C. blue co-methylation modules that are significantly (P value < 0.01) associated with asthma severity and/or lung function

Fig. 5
Fig. 5 Pathways significantly enriched by genes in the A. turquoise, B. red, and C. green co-expression modules that are significantly (P value < 0.01) associated with asthma severity and/or lung function

Fig. 6 Fig. 7
Fig. 6 The risk prediction performance of different risk models including MRS, TRS and clinical biomarkers.A in the discovery BECs dataset B in the validation AECs dataset.MRS-methylation-based risk Score, TRS-transcriptomic-based risk score

Fig. 8
Fig. 8 Scatter plot showing the correlation between DNA methylation levels and gene expression levels for 81 samples with both DNA methylation and gene expression data in validation AECs dataset

Table 1
Clinical characteristics of 142 asthmatic and control subjectsAsthma severity is categorized based on STEP classification as mild, moderate, and severe asthma AA African American, EA European American, BMI body mass index, FEV1 forced expiratory volume at 1 s, FVC forced vital capacity, FEV1/FVC the ratio of FEV1 to FVC, IgE immunoglobulin E, BAL Bronchoalveolar lavage The significance levels of comparison (mean/median of mild-moderate asthmatics vs controls and severe asthmatics vs controls) were indicated by the stars (***P < 0.001, **P < 0.01, *P < 0.05).The P-values are for three-group (control, mild-moderate and severe)

Table 2
Mediation analysis of gene expression in the association between DNA methylation and asthma severity and lung function measure (FEV1) in the discovery BECs dataset ME denotes causal mediational effect (indirect effect of each CpG on asthma severity/FEV1 that goes through mediating gene).The values in bold show genes that are significantly mediating the DNA methylation effect on asthma severity/FEV1 association

Table 3
Validation of the methylation differences of CpGs between asthmatic subjects and controls in independent validationACEs dataset